An approach to discover and recommend cross-domain bridge-keywords in document banks
نویسندگان
چکیده
Purpose – The co-word analysis method is commonly used to cluster-related keywords into the same keyword domain. In other words, traditional co-word analysis cannot cluster the same keywords into more than one keyword domain, and disregards the multi-domain property of keywords. The purpose of this paper is to propose an innovative keyword co-citation approach called “Complete Keyword Pair (CKP) method”, which groups complete keyword sets of reference papers into clusters, and thus finds keywords belonging to more than one keyword domain, namely bridge-keywords. Design/methodology/approach – The approach regards complete author keywords of a paper as a complete keyword set to compute the relations among keywords. Any two complete keyword sets whose corresponding papers are co-referenced by the same paper are recorded as a CKP. A clustering method is performed with the correlation matrix computed from the frequency counts of the CKPs, for clustering the complete keyword sets. Since keywords may be involved in more than one complete keyword set, the same keywords may end up appearing in different clusters. Findings – Results of this study show that the CKP method can discover bridge-keywords with average precision of 80 per cent in the Journal of the Association for Computing Machinery citation bank during 2000-2006 when compared against the benchmark of Association for Computing Machinery Computing Classification System. Originality/value – Traditional co-word analysis focuses on co-occurrence of keywords, and therefore, cannot cluster the same keywords into more than one keyword domain. The CKP approach considers complete author keyword sets of reference papers to discover bridge-keywords. Therefore, the keyword recommendation system based on CKP can recommend keywords across multiple keyword domains via the bridge-keywords.
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملتدوین و سنجش نشانگرهای ارزیابی تحول اداری: مطالعه موردی
Introduction: The administrative reform is one of the means to achieve economic social and cultural policy development. Given the necessity of administrative reform monitoring, this study aimed to identify and measure the indicators of administrative reform in the medical school of Tehran University of Medical Sciences. Methods: A mixed sequential qualitative-quantitative approach was employed....
متن کاملInfluenza Global Scholarly Publications in the Mirror of the Co-Word Analysis Technique
Background and Objective: Influenza is a global epidemic disease that always causes irreparable damage to individuals and countrieschr('39') health, and the amount of research in this area is increasing. The primary purpose this study is to discover the hidden patterns and emerging events of global influenza publications. Materials and Methods: This article is an applied type was done with an ...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The Electronic Library
دوره 28 شماره
صفحات -
تاریخ انتشار 2010